Войти
  • 50620Просмотров
  • 3 месяца назадОпубликованоJia-Bin Huang

How AI Taught Itself to See [DINOv3]

How can we train a general-purpose vision model to perceive our visual world? This video dives into the fascinating idea of self-supervised learning. We will discuss the basic concepts of transfer learning, contrastive language-image pretraining (CLIP), and self-supervised learning methods, including masked autoencoder, contrastive methods like SimCLR, and self-distillation methods like DINOv1, v2, and v3. I hope you enjoy the video! 00:00 Introduction 00:33 Why do features matter? 01:11 Learning features using classification 02:14 Learning features using language (CLIP) 04:09 Learning features using pretask (Self-supervised learning) 05:20 Learning features using contrast (SimCLR) 06:36 Learning features using self-distillation (DINOv1) 12:18 DINOv2 13:54 DINOv3 References: Language-image pretraining [CLIP] Self-supervised learning (pretask): [Context encoder] [Colorization] [Rotation prediction] [Jigsaw puzzle] [Temporal order shuffling] Contrastive learning [SimCLR] Inpainting [MAE] [iBOT] Self-distillation [DINOv1] [DINOv2] [DINOv3] Self-supervised learning [Cookbook] Video made with Manim: